276 research outputs found

    Path integral policy improvement with differential dynamic programming

    Get PDF
    Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task

    Information-Theoretic Policy Extraction from Partial Observations

    Full text link
    We investigate the problem of extracting a control policy from a single or multiple partial observation sequences. Therefore we cast the problem as a Controlled Hidden Markov Model. We then sketch two information-theoretic approaches to extract a policy which we refer to as A Posterior Control Distributions. The performance of both methods is investigated and compared empirically on a linear tracking problem

    On entropy regularized Path Integral Control for trajectory optimization

    Get PDF
    In this article, we present a generalized view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly yielding a formal optimal state trajectory distribution. In this contribution, we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross-entropy between the optimal and a state trajectory distribution parametrized by a parametric stochastic policy. Inspired by this observation, we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is often addressed lately by the Reinforcement Learning (RL) community. We analyze the theoretical convergence behavior of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization

    On Entropy Regularized Path Integral Control for Trajectory Optimization

    Get PDF
    In this article we present a generalised view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly to yield a formal optimal state trajectory distribution. In this contribution we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross entropy between the optimal and a state trajectory distribution parametrized through its policy. Inspired by this observation we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is lately often addressed by the Reinforcement Learning (RL) community. We analyse the theoretical convergence behaviour of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization

    Optimizing state trajectories using surrogate models with application on a mechatronic example

    Get PDF
    The classic design- and simulation methodologies, that are constituting today’s engineer main tools, fall behind with industry’s ever increasing complexity. The strive for technological advancement heralds new performance requirements and optimality remains no longer a concern limited to regime operation. Since the corresponding dynamic optimization problems incorporate accurate system models, the current techniques are plagued by the high computational weight these multi-disciplinary and highly dimensional system models bear with them. This imbalance advocates for the need to adapt the existing approaches. In this study we propose an algorithmic framework as an extension of the direct transcription method, which has already proven its usefulness concerning this matter. It is suggested to construct a surrogate model of the derivative function that is iteratively refined in a region of interest. Thereafter the method will be illustrated on an academic yet nonlinear example

    Polynomial Chaos reformulation in Nonlinear Stochastic Optimal Control with application on a drivetrain subject to bifurcation phenomena

    Full text link
    This paper discusses a method enabling optimal control of nonlinear systems that are subject to parametric uncertainty. A stochastic optimal tracking problem is formulated that can be expressed in function of the first two stochastic moments of the state. The proposed formulation allows to penalize system performance and system robustness independently. The use of polynomial chaos expansions is investigated to arrive at a computationally tractable formulation expressing the stochastic moments in function of the polynomial expansion coefficients rigorously. It is then demonstrated how the stochastic optimal control problem can be reformulated as a deterministic optimal control problem in function of these coefficients. The proposed method is applied to find a robust control input for the start-up of an eccentrically loaded drive train that is inherently prone to bifurcation behaviour. A reference trajectory is chosen to deliberately provoke a bifurcation. The proposed framework is able to avoid the bifurcation behaviour regardlessly.Comment: 7 pages; 5 figures; ICSTCC 2018, 22nd International Conference on System Theory, Control and Computing. 10 - 12 October. Sinaia - Romani

    Polynomial chaos explicit solution of the optimal control problem in model predictive control

    Get PDF
    A difficulty still hindering the widespread application of Model Predictive Control (MPC) methodologies, remains the computational burden that is related to solving the associated Optimal Control (OC) problem for every control period. In contrast to numerous approximation techniques that pursue acceleration of the online optimization procedure, relatively few work has been devoted towards shifting the optimization effort to a precomputational phase, especially for nonlinear system dynamics. Recently, interest revived in the theory of general Polynomial Chaos (gPC) in order to appraise the influence of variable parameters on dynamic system behaviour and proved to yield reliable results. This article establishes an explicit solution of the multi-parametric Nonlinear Problem (mp-NLP) based on the theoretical framework of gPC, which enabled a polynomial approximated nonlinear feedback law formulation. This resulted in real-time computations allowing for real-time MPC, with corresponding control frequencies up to 2 kHz

    Model-based feedforward targeting of magnetic microparticles in fluids using dynamic optimization

    Get PDF
    External magnetic field gradients originating from electromagnets can generate forces on ferromagnetic microparticles to aid and enable precise local targeting of these particles. To steer these magnetic particles from their initial position to a desired target zone in a fluid, a control strategy on the proper activation of the electromagnets is required. We propose a model-based control strategy that performs dynamical optimization with respect to a given metric that results in an optimal particle trajectory. Here, minimum power consumption of the electromagnet is considered as metric. Furthermore, a dynamical model containing the magnetic fluidic forces acting on the particles is incorporated in the dynamic optimization. Results show the benefits of following the presented approach since it allows control of the electromagnets in open loop
    corecore